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Abstract 

Background: Elucidation of protein-protein interaction (PPI) networks is important for understanding disease 
mechanisms and for drug discovery. Tertiary-structure-based in sllico PPI prediction methods have been developed 
with two typical approaches: a method based on template matching with known protein structures and a method 
based on de novo protein docking. However, the template-based method has a narrow applicable range because 
of its use of template information, and the de novo docking based method does not have good prediction 
performance. In addition, both of these in sllico prediction methods have insufficient precision, and require 
validation of the predicted PPIs by biological experiments, leading to considerable expenditure; therefore, PPI 
prediction methods with greater precision are needed. 

Results: We have proposed a new structure-based PPI prediction method by combining template-based prediction 
and de novo docking prediction. When we applied the method to the human apoptosis signaling pathway, we 
obtained a precision value of 0.333, which is higher than that achieved using conventional methods (0.231 for 
PRISM, a template-based method, and 0.145 for MEGADOCK, a non-template-based method), while maintaining an 
F-measure value (0.285) comparable to that obtained using conventional methods (0.296 for PRISM, and 0.220 for 
MEGADOCK). 

Conclusions: Our consensus method successfully predicted a PPI network with greater precision than conventional 
template/non-template methods, which may thus reduce the cost of validation by laboratory experiments for 
confirming novel PPIs from predicted PPIs. Therefore, our method may serve as an aid for promoting interactome 
analysis. 



Introduction 

Elucidation of regulatory relationships among the tens 
of thousands of protein species that function in a 
human cell is crucial for understanding the mechanisms 
underlying diseases and for the development of medi- 
cines [1]. Predicting protein-protein interaction (PPI) 
networks at the genome scale is one of the main topics 
in systems biology. 
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The methods used for PPI network prediction include 
primary-structure-based searching [2,3], evolutionary 
information-based methods [4], and tertiary-structure- 
based methods [5-7]. Tertiary-structure-based methods 
are attracting attention because they provide predicted 
protein complex structures and because they do not 
depend on homologous proteins. Tertiary structural 
information also provides powerful features for recogni- 
tion [8,9] and is therefore useful for predicting binding 
affinity [10] in protein-protein complexes. 

There are two typical approaches for tertiary-struc- 
ture-based PPI predictions: a method based on template 
matching with known protein structures and another 
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method based on de novo protein docking. The tem- 
plate-based method is based on the hypothesis that 
known complex structures or interface architectures can 
be used to model the complex formed between two tar- 
get proteins. The hypothesis is logical, and this method 
provides good prediction performance when complex 
structural information is available as a template; how- 
ever, if the template structure information is not avail- 
able, performance is poor. In addition, because the 
interface architecture is not always similar for similar 
interactions, the template-based method has a narrow 
applicable range. In contrast, the de novo docking based 
method has a wide applicable range because it uses only 
tertiary structural information. However, because the 
advantage provided by existing template information is 
not utilized, the prediction performance is poor. 

Tuncbag et al. developed a template-based PPI predic- 
tion method called PRISM [5], which is based on infor- 
mation regarding the interaction surface of crystalline 
complex structures. PRISM has been applied for predict- 
ing PPIs in a human apoptosis pathway [11] and a p53- 
protein-related pathway [12], and has contributed to the 
understanding of the structural mechanisms underlying 
some types of signal transduction. Ohue et al. developed 
a PPI prediction method called MEGADOCK [6] and 
Wass et al. developed a method [13] based on protein- 
protein docking without interaction surface information. 
MEGADOCK has been applied for PPI prediction for a 
bacterial chemotaxis pathway [7,14] and has contributed 
to the identification of protein pairs that may interact. 

However, the prediction results of both template-based 
and de novo docking-based methods in these studies con- 
tained many false-positive predictions. PRISM obtained a 
precision value of 0.231 when applied to a human apop- 
tosis pathway that consisted of 57 proteins, which was 
higher than the precision obtained with random predic- 
tion (precision value of 0.086), and MEGADOCK 
obtained a precision value of 0.400 when applied to a 
bacterial chemotaxis pathway that consisted of 13 pro- 
teins, which was higher than the precision obtained with 
random prediction (precision value of 0.253). To identify 
new PPIs, the prediction results need to be validated 
using biological experiments. For this purpose, obtaining 
a low number of predicted interaction candidates with 
high reliability is more important than obtaining a high 
number of predictions with low reliability. Thus, this 
paper aims to improve the reliability of the method used 
to obtain PPI predictions. 

In this study, we combined two different PPI prediction 
methods to improve the precision of PPI prediction. 
Because PRISM is a template-based method, its prediction 
accuracy depends on the template dataset prepared. Only 
PPIs whose interaction surface structures are conserved 
are expected to be predicted. In contrast, MEGADOCK is 



a non-template-based method (also called de novo predic- 
tion), which has the demerit of generating false-positives 
for the cases in which no similar structures are seen in 
known complex structure databases; thus, template-based 
method would be ruled out fi-om the prediction. However, 
in situations where template structures are not present in 
databases, MEGADOCK can still predict PPIs. This quali- 
tative difference between the two methods typically makes 
their output different. Thus, the combination of both pre- 
diction methods may improve prediction accuracy, as the 
intersection set (AND set) of both results may contain 
fewer false-positives; this improvement in precision would 
also contribute to improvement in the prediction reliability 
provided by the use of just one method. 

Such an approach is called a "meta" approach. Meta 
approaches have already been used in the field of protein 
tertiary structure prediction [15], and critical experiments 
have demonstrated improved performance of meta predic- 
tors when compared with the individual methods used in 
the meta predictors. The meta approach has also provided 
favorable results in protein domain prediction [16] and the 
prediction of disordered regions in proteins [17]. We have 
therefore proposed a new PPI prediction method based on 
the consensus between template-based and de novo dock- 
ing methods. Generally, a meta prediction method may 
have low applicability because meta approaches require 
applicable conditions for every method in the approach. 
However, if structural information is available, the de novo 
docking method introduced in this study is always applic- 
able with or without template information. Thus, the 
applicability of the consensus method is not narrower 
than that of a template-based method. 

Materials and methods 

Template-based PPI prediction 

We used PRISM for template-based PPI prediction. 
PRISM uses two input datasets: the template set and the 
target set. The template set consists of interfaces extracted 
from protein pairs that are known to interact. The target 
set consists of protein chains whose interactions need to 
be predicted. The two sides of a template interface are 
compared with the surfaces of two target monomers by 
structural alignment. If regions of the target surfaces are 
similar to the complementary sides of the template inter- 
face, then these two targets are predicted to interact with 
each other through the template interface architecture. 

The prediction algorithm consists of four steps: 
(1) interacting surface residues of target chains are 
extracted using Naccess [18]; (2) complementary chains 
of template interfaces are separated and structurally 
compared with each of the target surfaces by using 
MultiProt [19]; (3) the structural alignment results are 
filtered according to threshold values, and the resulting set 
of target surfaces is transformed onto the corresponding 
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template interfaces to form a complex; and (4) the Fiber- 
Dock [20] algorithm is used to refine the interactions to 
introduce flexibility, resolve steric clashes of side chains, 
compute the global energy of the complex, and rank the 
solutions according to their energies. When the computed 
energy of a protein pair is less than -10 kcal/mol, the pair 
is determined to "interact" (personal communication with 
Ms. Saliha Ece Acuner Ozbabacan, July 12, 2013). This 
prediction protocol has been described in detail in a 
previous study [5,11]. 

PPI prediction based on the de novo docking method 

For de novo protein docking-based PPI prediction, we 
used MEGADOCK version 2.6.2 [7]. MEGADOCK does 
not require template structures for prediction. The PPI 
prediction scheme used in this study consists of two 
steps. First, we conducted rigid-body docking calcula- 
tions based on a simplified energy function considering 
shape complementarity, electrostatics, and hydrophobic 
interactions for all possible binary combinations of pro- 
teins in the target set. Using this process, we obtained a 
group of high-scoring docking complexes for each pair 
of proteins. Next, we applied ZRANK [21] to the pre- 
dicted complex structures for more advanced binding 
energy calculation and re-ranked the docking results 
based on ZRANK energy scores. The deviation of the 
selected docking scores from the score distribution of 
high- ranked complexes was determined as a standar- 
dized score (Z-score) and was used to assess possible 
interactions. This prediction protocol has been described 
in previous studies [22,23]. Potential complexes that had 
no other high-scoring interactions nearby were rejected 
using structural differences. Thus, we considered likely 
binding pairs that had at least one populated area of 
high-scoring structures, one of which may be the true 
binding site. 

Consensus prediction method 

In this study, we proposed a new meta-prediction method 
by evaluating the consensus between both previously used 
prediction methods. The proposed method consists of two 
steps: (1) prediction from the same target set by PRISM 
and MEGADOCK and (2) consideration that the method 
provides a prediction regarding target protein pair interac- 
tion only when both PRISM and MEGADOCK predict 
that the target protein pair interacts. Although some true- 
positives may be dropped by this method, the remaining 
predicted pairs are expected to have higher reliability 
because of the consensus between two prediction methods 
that have different characteristics. 

Dataset 

In this study, we focused on the human apoptosis sig- 
naling pathway previously analyzed by PRISM because 



our prediction results can thus be compared directly to 
the results of the previous study. PRISM and MEGA- 
DOCK are based on three-dimensional protein struc- 
tures and therefore can only be applied to proteins whose 
tertiary structures are available. Therefore, we searched 
among proteins involved in the human apoptosis path- 
way that were present in the Protein Data Bank (PDB) 
(accessed on July 28, 2012). We selected several proteins 
that had the highest resolution for the structural group 
that had high sequence similarity (>0.9) with the other 
proteins in the dataset [11]. After filtering according to 
resolution and sequence similarity, we obtained 158 PDB 
structures that corresponded to 57 proteins in the 
human apoptosis pathway described in KEGG (KEGG 
pathway ID: hsa04210) [24]. The PDB IDs in this struc- 
ture dataset were the same as those used by Ozbabacan 
et al. [11]. Table 1 shows the list of PDB IDs and chains 
of this dataset. 

Known PPIs were collected from the STRING database 
[25]. We used only experimental data in the literature 
obtained from STRING with a confidence score >0.5. The 
number of known PPIs was 137. Because the database 
does not contain existing self-interactions, we did not pre- 
dict self-interactions. Thus, the number of target pairs 
was 57C2 = 1,596. 

Evaluation of prediction performance 

Here, we have defined #TP, #FP, #FN, #TN, precision, 
recall, and the F-measure, which we used to evaluate the 
prediction results: #TP is the number of predicted PPIs 
that were also found in STRING (true-positive), #FP is 
the number of predicted PPIs that were not in STRING 
(false-positive), #FN is the number of PPIs not predicted 
by the system even though the pair was found to interact 
in STRING (false-negative), and #TN is the number of 
negative predictions that were also not found in STRING 
(true-negative). Precision, recall, and the F-measure are 
represented as follows: 

#TP „ #TP 2-#rp 

precision = , recall = , t — measure = , 

^ #TP + #FP #TP + #FN 2 ■ #TP + #FP + #FN 

where the F-measure is the harmonic mean of precision 
and recall. To identify new PPIs in biological experiments 
after in silico screening, precision is more important than 
recall to reduce the cost of validation. 

Results and Discussion 

Comparison of template- and non-template-based 
methods 

Figure 1(a) and 1(b) show the prediction results for PRISM 
and MEGADOCK, respectively, as applied to a human 
apoptosis pathway. The threshold used for MEGADOCK 
prediction yielded the best value of the F-measure for this 
dataset. The diagonal line (black cells) in Figure 1 indicates 
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Table 1 Protein and PDB ID list of human apoptosis pathway dataset 









PDB ID LChain) 


Air 


1 ^yl£l A 








A I/T1 

AM 1 


1 1 1 N |/~\ A 

1 UINU_A 


oLUVv_A 


■3(00 A A 




AKT2 


1 MKv_A 


1 06K_A 


1 r^fk\ A 

1 WDL_A 


1 P^^Q A 
1 ruo_A 


Al\l i 


1V1 Q A 

ZA 1 o_A 








A D A CI 

Ar Ar 1 


1 LY j_A 


1 V^T A 
1 Zo 1 _A 


nyriQ a 

Z T O J_rt 


^I7A A 
jlZ.A_A 


dLL-z 


zVv jL_A 


TV An A 
zAAU_A 






D'^L-AL 


ZD40_A 


ni A 






Rin 

DIU 


?Rin A 

Z D 1 U_ A 


-Jl/DVA/ D 

Zt\DVv_D 






B3X 


1 r 1 D_A 


ZOJD_l 


ZArtU ^ 


^Pk'i R 
or t\ 1 _D 


LAorO 


1 DWn A 


1 Dwn R 


Tpik'n A 

ZUi\VJ_rt 


TDk'n R 

z.Ui\\J_u 


A CDA 
LAbro 


')\A/nD A 
zVVUr_A 








A CD7 

LAor/ 


1 PI 1 A 
1 r 1 J_A 


1 MPl A 
1 14U_A 


1 151 A 


1 151 B 


k,Aoro 


1 HTM A 
1 KJ, \ IN_A 


1 HTM R 
1 \J, 1 IN_D 


IM R 
ZrUIN_D 


51-11 1 R 
on 1 l_D 




1 JXQ_A 


1 M\A/n D 
1 lNVvy_D 


ouy i_L 


0 roo_r 


Calpaini 


1 ZCM_A 








Calpain2 


1 l/CI 1 1 

1 l\rU_L 


zlMtJA_A 






Ln[,Lnrj 


2E30_A 










zdLL_A 








(">-i/DDD:3<~ A\ 

Ln(,rrroLAJ 


1 A 1 II A 

1 AUI_A 


1 \ACO A 

1 Mro_A 


zr\zo_'._ 


0 LL£5_A 


Ln(rrriK 1 J 


1 A Ml D 
1 AUI_D 


1 IViro_D 


J LLc3_D 




LyiL 


1 I3C A 
1 Jo J_A 








Urr4U 


1 IDV A 
1 IDA_A 








ncc/1 c 


1 IDV D 
1 1 DA_D 


1 IVD A 

1 1 Y K_ A 






FADD 


1 A 1 \ A/ A 
1 A 1 VV_A 


A 

/brb_A 


D 

OllZ.^J_D 




CI ID 

rLlr 


QU1 1 1 A 

on 1 l_A 








ras 


otvv 1 _t 


0 tZU_A 






1 A D I'D ID/' — )'\ 


ouy l_A 


OlVI 1 U_A 


IP A 




1 A D/DID/'~^\ 

lAr^tSIKLjJ 


Tl l\/l A 

zUVL_A 


OtD_>_A 


OtOD_A 


OiVIUrt_L' 


1 A D/D\Df~A\ 


1 f — 7Q r~ 


1 14U_L 


1 K 1 C 
1 \Z) l_t 


1 M\A/0 A 

1 iMvvy_A 




2P0I_A 


3CM7_C 






IwBa 


1 IKN_D 


1 NFi_E 






11/1/ 


") l\/V A 

zJVA_A 


QDDT D 

odKI_d 


^RR\/ R 
ODnv_D 


^ri 5 n 

0^_L0 LJ 


II UAI 
1 L- 1 {f\) 


711 A A 

Z 1 L A_ A 








11 1 /D\ 

IL-l(B) 


1 ITD A 
1 1 1 D_A 


nM\/u A 
zNVn_A 


0U4U_A 




II 1 DM \ 

IL- 1 n(. 1 ) 


1 ITD D 
1 1 1 D_D 








1 1 1 D/D A n\ 

IL-1 K(KAKJ 


3040_B 








II Q 
ILo 


1 II 1 A 
IJLI_A 










1 cr 1 A 

1 t'oJ_A 








ID A l/T 
IKAIxz 


OlVlUr_l\ 








ID A l/zl 


IMDI 1 A 

zInKU_A 


oMUr_u 






iviyuoo 


1 IC7 A 

zJo/_A 


OlVIUr_A 






\ir . Di'Niri/Di\ 

NI--ft:D(.NI-KDl J 


1 1 KN_C 


1 NFI_B 


1 oVL_r 


TPiDC A 
zUDr_A 


MC .^D^DCI A\ 

IMr-KD^KLLAJ 


1 ll/M A 
1 1 l\lN_A 


1 MCI A 

1 lNri_A 








1\AAAAA/ \/ 
1 VVVVVV_V 


zlrU_t 






PI3K(PIK3CA) 


2ENQ_A 


2V1 Y_A 


3HHM_A 




PI3K(PIK3CG) 


1E8Y_A 








PI3K(PIK3R1) 


1A0N_A 


1H90_A 


1PBW_A 


2IUG_A 




3I5R_A 








PI3K(PIK3R2) 


2KT1_A 


2XS6_A 


3MTT_A 




PRKACA 


3AGM_A 








PRKAR2A 


2IZX_A 








TNFa 


1A8M_A 


4TSV_A 






TNF-R1 


lEXr.A 


1ICH_A 







3YGS C 



2J32_A 
2QL9_A 



2QL9_B 



3M0D_D 
2ECG A 



3FX0 A 



2KNA A 



2V1Y 



3HHM_B 



Ohue et al. BMC Proceedings 2013, 7(Suppl 7):S6 
httpy/www.biomedcentral.com/1753-6561/7/S7/S6 



Page 5 of 10 



Table 1 Protein and PDB ID list of human apoptosis pathway dataset (Continued) 
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The abbreviations used are: AIF, apoptosis-inducing factor, mitochondrion-associated, 1 (AIFMI); AKTl, RAC-alpha serine/threonine-protein kinase; AKT2, RAC-beta 
serine/threonine-protein l^inase; AICT3, RAC-gamma serine/threonine-protein l<inase; APAFl, apoptotic peptidase activating factor 1; BCL-2, B-cell lymphoma 2; 
BCL-XL, BCL extra-large; BID, BH3 interacting domain death agonist; Bax, BCL-2-assoclated X protein; CASP3/6/7/8/9, caspase-3/6/7/8/9; Cn(CHP), calcineurin B 
homologous protein 1; Cn(CHP2), calcineurin B homologous protein 2; Cn(PPP3CA), protein phosphatase 3 catalytic subunit alpha Isoform; Cn(PPP3R1), protein 
phosphatase 3 regulatory subunit 1; CytC, cytochrome C; DFF40, DNA fragmentation factor, 40kDa, beta polypeptide; DFF45, DNA fragmentation factor, 45kDa, 
alpha polypeptide; FADD, Fas-associated via death domain; FLIP, FLICE/CASP8 inhibitory protein (CASP8 and FADD-IIke apoptosis regulator, CFLAR); Fas, tumor 
necrosis factor receptor (TNF) superfamlly member 6; lAP, inhibitor of apoptosis; BIRC2/3/4, baculoviral lAP repeat-containing protein 2/3/4; IsBa, nuclear factor of 
kappa light polypeptide gene enhancer in B-cells inhibitor alpha; IKK, inhibitor of nuclear factor kappa-B kinase; IL-l(A), interleukin-1 alpha; IL-1(B), interleukin-1 
beta; IL-lR(l), type 1 interleukin-1 receptor; IL-IR(RAP), interleukin-1 receptor accessory protein; IL-3, interleukin-3; IL-3R, interleukin-3 receptor; IRAK2/4, 
interleukin-1 receptor-associated kinase 2/4; MyDSB, myeloid differentiation primary response protein MyD88; NF-k,B{NFKB1), nuclear factor of kappa light 
polypeptide gene enhancer in B-cells; NF-kB(RELA}, nuclear factor of kappa light polypeptide gene enhancer in B-cells 3; NGF, nerve growth factor (beta 
polypeptide); PI3K, phosphatidylinositide 3-kinase; PIK3CA, PI3K subunit alpha; PIK3CG, PI3K subunit gamma; PIK3R1, PI3K regulatory subunit alpha; PIK3R2, PI3K 
regulatory subunit beta; PRKACA, cyclic adenosine monophosphate {cAIVlP)-dependent protein kinase catalytic subunit alpha; PRKAR2A, cAIVIP-dependent protein 
kinase type ll-alpha regulatory subunit; TNFa, tumor necrosis factor; TNF-R1, TNF receptor superfamlly member 1A; TP53, cellular tumor antigen p53; TRADD, TNF 
receptor type 1-associated death domain protein; TRAF2, TNF receptor-associated factor 2; TRAIL, TNF receptor superfamlly member 10; TRAIL-R, TNF receptor 
superfamlly member 10B; TrkA, neurotrophic tyrosine kinase receptor type 1. 



self-interactions that were not considered as prediction 
targets. As shown in Figure 1, PRISM was performed with 
fewer FPs than MEGADOCK. Table 2 shows the evalua- 
tion of prediction results. With MEGADOCK, we 
obtained a lower value of precision and a higher value of 
recall relative to PRISM. When the F-measure was evalu- 
ated as a measure of overall performance, MEGADOCK 
showed lower values than PRISM. Predictions by MEGA- 
DOCK contained more FPs because, in contrast to 
PRISM, MEGADOCK does not restrict interface struc- 
tures to those found in template structures. In contrast, 
PRISM obtained lower recall values than MEGADOCK 
because it only searched interactions whose interface 
structures could be found in the template set. 

Results of the consensus prediction 

Figure 2 shows the Venn diagram of the number of TPs 
and FPs of the results of PRISM and MEGADOCK. A 
large difference was observed in the results obtained by 
the two methods. Thus, combining the prediction results 
of PRISM and MEGADOCK may provide better perfor- 
mance in PPI prediction. All of the predicted pairs of TPs 
and FPs are shown in Table SI in Additional File 1. 

Figure 1(c) shows the prediction obtained on consensus 
between PRISM (a) and MEGADOCK (b); notably, the 
number of FP samples greatly decreased. The first row of 
Table 2 shows that the consensus method obtained an 
F-measure value of 0.285, which was comparable to the 
PRISM result (F-measure = 0.296). The consensus predic- 
tion indicated a higher value of precision for the consensus 
method (0.333) than for PRISM (0.231). The consensus 
method yielded the highest precision value in the method 



shown in Table 2. This method is useful when validating 
unknown PPI predictions using biological experiments. In 
contrast, OR prediction demonstrated high recall (Table 
2). Thus, the OR method will be useful when prediction 
with high sensitivity, e.g., in the initial construction of the 
draft PPI network from the relevant proteins, is required. 

An example of a false-positive pair and its predicted 
complex structure 

The caspase-3 and caspase-7 pair is shown as an exam- 
ple of FP predictions in both PRISM and MEGADOCK 
with a particularly high evaluation value. Both caspase-3 
and caspase-7 are effector caspases, which belong to a 
family of cysteine proteases that play essential roles in 
apoptosis. Effector caspases are activated by initiator 
caspases (e.g., caspase-2, 8, and 9), and then induce 
apoptotic cell death. Although the initiator and effector 
caspase cascade is well known, interactions among effec- 
tor caspases are disputed [26]. 

The interaction of caspase-3 and caspase-7 was pre- 
dicted with a high affinity score; the PRISM energy value 
was less than -190 kcal/mol and the MEGADOCK dock- 
ing score was higher than 10,000. These values indicate a 
powerfvil affinity interaction. Figure 3 shows the predicted 
complex structure for caspase-3 and caspase-7. The pre- 
dicted complex consists of 2DKO chain A (caspase-3, pl7 
subunit) and 2QL9 chain B (caspase-7, plO subunit). 

Additionally, 2DKO chain B (caspase-3, pl2 subunit) and 
2QL9 chain B, and 2QL9 chain A (caspase-7, p20 subunit) 
and 2DKO chain A, respectively, have similar structures. 
Thus, the predicted complex with each subunit swapped, 
as shown in Figure 3, is similar to the original heterodimer 
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(c) Consensus 




True Positive 
False Positive 
False Negative 



Figure 1 Apoptosis prediction by the (a) PRISIW, (b) IVIEGADOCK, and (c) consensus methods. The green cells are true-positives, the red 
cells are false-positives, and the purple cells are false-negatives. The diagonal cells (black cells) have no PPI information in the STRING database 
and are excluded from the prediction targets. 



Table 2 Accuracy of human apoptosis pathway prediction 
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Figure 2 Venn diagram of apoptosis pathway prediction 
results. The common set (#TP = 34, #FP = 68) is denoted as 
"Consensus". 

. . 



and possibly predicted to occur with a high score. The 
interaction among effector caspases, as in this case, has not 
been examined by biological experiments. In fact, another 
PPI prediction tool based on template structure and data- 
base information, PrePPI [28,29] (version 1.2.0), predicted 
the pair of caspase-3 and caspase-7 with a high score (the 
final probability value was 0.99). This situation is difficult 
to avoid in large-scale prediction problems. However, 
efforts such as the Negatome project [30] will help to 
improve this difficulty in the future. 




Figure 3 Predicted complex structure of caspase-3 and 
caspase-7. The red colored chain is caspase-3 protein (pi 7 subunit, 
PDB ID: 2QL9, chain B) and the green colored chain is caspase-7 
(plO subunit, PDB ID: 2DK0, chain A). The complex structure Is 
predicted by MEGADOCK with the highest ranl<. This Image was 
produced using PyMOL software [27], 



Relationship between the number of predicted positives 
and the number of structures 

The structure-based PPI prediction method may generate 
positives with some bias regarding the type of proteins 
(rows and columns of Figure 1). From Table 1 and 
Figure 1, predictions with a large number of protein struc- 
tures tend to generate more positive pairs. To verify this 
tendency, the number of PDB chain structures used for 
PPI prediction and the number of positive predicted pairs 
containing its protein are plotted in Figure 4. The #TPs 
are shown in Figure 4(a) and the #FPs are shown in 
Figure 4(b). Pearson's correlation coefficient R and the 
P-value for the correlation coefficient f-test are shown in 
Table 3. 

From the results of the f-tests, the number of chains 
and the number of positive predictions were clearly cor- 
related with P < 0.05 in all cases, which suggests that the 
structure-based PPI prediction method should address 
the number of used protein structures without bias. For 
example, in a template matching-based method such as 
PRISM, a protein pair with more conformations of struc- 
tures will have more matches in template complexes and 
a higher possibility of predicted interaction. In Table 3, 
the correlation coefficient values are particularly high in 
FP predictions. Therefore, for more precise prediction, 
we should consider one of the two ways: (i) how to gen- 
erate the target set without multiple conformations in 
each protein and (ii) develop a correction method when 
the target set contains multiple conformations. 

Performance evaluation with various sensitivity 
parameters 

In this study, we used a fixed threshold value for MEGA- 
DOCK that provided the best F-measure value for 
the target dataset. Figure 5 shows a plot of precision vs. 
F-measure value for prediction results with various thresh- 
old values for MEGADOCK. Figure 5 also plots the per- 
formance of the consensus method with various threshold 
values for MEGADOCK prediction while the threshold 
value for PRISM prediction was fixed. When the threshold 
value was changed in MEGADOCK, the plotted values 
remained in the region of low precision (0.0-0.2), and 
lower F-measure values were observed in the region of 
higher precision because of the decreased recall value. 
The consensus prediction method maintained a stable 
F-measure value when the value of precision was approxi- 
mately 0.2-0.3, although the performance in the high-pre- 
cision region (> 0.4) was inferior to that of MEGADOCK. 
In this region, the consensus prediction provides a better 
precision value than PRISM while maintaining the same 
F-measure value. Figure 5 clearly shows that the perfor- 
mance obtained by using the consensus method is better 
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Figure 4 Number of PDB chains vs. positive predictions, (a) Shows the number of true-positives and (b) shows the number of false-positives. 
The horizontal axis is the number of PDB chains used in the interaction prediction, and the vertical axis is the number of positives predicted by 
using protein structures. 



over a wide range of threshold values than the prediction 
obtained using only MEGADOCK. 

The AUC, i.e., the area under the ROC curve [31], is a 
more general and effective statistical measure. The 
ROCo.i curves, which include the ROC curves up to an 
FP rate of 0.1, are shown in Figure 6. ROC curves were 
created by plotting the TP rate (#TP/(#TP-t-#FN)) against 
the FP rate (#FP/(#FP+#TN)). Regions with high FP rates 
are not useful for prediction because many FPs are gener- 
ated, e.g., an FP rate of 0.2 represents #FP = 292. The 
ROCo.i curve was thus considered to favor methods that 
produce a high TP rate at low FP rates, and the asso- 
ciated area under the curve is referred to as AUCo.i. A 
perfect prediction will produce an AUCq i of (0.1 x 1 =) 
0.1, whereas a random prediction will result in an AUCq i 
of (0.1 X 0.1/2 =) 0.005. Figure 6 shows that the consen- 
sus prediction (AUCo.i = 0.023) is better than the 
MEGADOCK (AUCq i = 0.014) and random predictions 
(AUCq.i = 0.005). 

Conclusions 

In this study, we propose a new PPI network prediction 
method based on the consensus between template-based 

Table 3 Correlation coefficient R and P-value of 
correlation test on Figure 4 
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(a) #TPs 




(b) #FPs 




fi 


P-value 
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P-value 
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MEGADOCK 


0.488 
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0.864 
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prediction and non-template-based prediction. The con- 
sensus method successfully predicted the PPI network 
more accurately than the conventional single template/ 
non-template method. Because such precise prediction 
can reduce biological screening costs, it will promote 
interactome analysis. For further improvement of pre- 
diction performance, it is necessary to further improve 
the combination of the two techniques, e.g., by using a 
strategy other than taking a simple AND/OR consensus. 
For example, biological information such as biochemical 
function and subcellular localization information could 
be used. 
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Figure 5 F-measure vs. precision for predictions when the 
MEGADOCK threshold parameter is changed in the apoptosis 
pathway prediction. The green triangle indicates the results of the 
PRISM prediction (Table 2). 
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